Lab 2. BIOE 515: Landscape Ecology & Management.
Types of spatial data.
Background and goals.
Throughout this course, we will be working with different kinds of spatial data. It is important to develop a fluency on the formats and types of data. We will first cover these formats and types of data, then combine different formats to conduct some simple analyses in ArcGIS Pro. At the end of the lab you will: (1) know a little bit about geographic projections, (2) be able to describe different data formats and types, (3) conduct several simple but powerful spatial analyses using ArcGIS Pro. Please use the template provided to organize your lab-write up.
Geographic projections and coordinate reference systems
This lab will focus on types of data that landscape ecologists and spatial analysts typically work with, but first we need to understand the importance of geographic coordinate systems and projections. In R these are called coordinate reference systems (crs). A coordinate reference system is a mathematical way of representing our three-dimensional world on a two-dimensional map. We won’t cover these in depth here, but managing projections can be a challenge. You will have to address projections if you work with spatial data. We will cover these in more depth next week. The video below is helpful to understand why the same maps can look so different.
If the video player doesn’t work, watch the video here: https://www.youtube.com/watch?v=kIID5FDi2JQ
Different formats and types of data.
Spatial data will be one of three formats:
(1) tables or tabular data (e.g., spreadsheets, data frames, attribute tables),
(2) vector (e.g., feature, shapefiles, polygons)
(3) raster (e.g., gridded map of pixels).
Raster and vector data will sometimes be called different kinds of “spatial data models”, but - given the variety of ways ecologists use “model” - I will avoid that here.
Below are examples of different kinds or formats of data.
Data organized and shared as these three formats (tabular, vector, raster) can be qualitative (categorical) or quantitative (numeric). This is a helpful chart from this online book. It is important to consider what kind of data you are working with and how the data can be combined to ask interesting questions.
Tables can obviously hold data of different kinds. Consider a table of field samples (e.g., “plots”) collected in a forest. Each plot might have data collected on the type of forest (e.g., aspen, Douglas-fir, lodgepole pine), the age of the forest (1 to 100s of years), number of trees for each species within the sample, number of species within each sample, average height of the forest stand, average seasonal temperature, etc. All of these data could be classified using the framework above.
Table of fictional forest plots with different kinds of data.
| PlotID | Forest type | Elevation (m) | Stand age (yrs) | Tree density (trees per plot) | Tree richness (species per plot) | Stand height (m) | Avg summer temp (°C) | Wildfire severity |
|---|---|---|---|---|---|---|---|---|
| 1A | Aspen | 1500.5 | 60 | 800 | 6 | 18.4 | 16.2 | Low |
| 2A | Douglas-fir | 1900.1 | 120 | 600 | 8 | 25.1 | 14.4 | Unburned |
| 3A | Aspen | 2100.3 | 70 | 700 | 5 | 15.9 | 13.8 | Moderate |
| 4A | Lodgepole pine | 2400.8 | 90 | 1200 | 4 | 20.7 | 12.3 | High |
| Variable | Data type (nominal, ordinal, discrete, or continuous?) |
|---|---|
| PlotID | Nominal |
| Forest type | __________________________ |
| Elevation (m) | Continuous |
| Stand age (yrs) | __________________________ |
| Tree density (trees per plot) | __________________________ |
| Tree richness (species per plot) | Discrete |
| Stand height (m) | __________________________ |
| Avg summer temp (°C) | __________________________ |
| Wildfire severity | __________________________ |
| __________________________ | nominal |
| __________________________ | ordinal |
| __________________________ | discrete |
| __________________________ | continuous |
Raster data can represent quantitative and categorical data of different kinds. Consider land cover data where pixels (small squares of known sizes) are assigned a category of land cover type (forest, grassland, developed). Raster data can also hold continuous quantitative data like elevation, total ecosystem carbon (mass of carbon per area), or an index of habitat suitability. Raster data can be used to represent categorical or continuous data as in the example below showing the National Land Cover Database (NLCD) (categorical land cover types, top) and elevation (meters above sea level, bottom).
Vector data are either polygons (states, wilderness areas, discrete maps of a species distribution), lines (streams, roads), or points (locations of plots or samples). Vector data can also be used to represent categorical or continuous data as in the example below showing the polygons of level 4 ecoregions (categorical regions, top) and mean species richness within those polygons (average number of species).
Consider how categorical data can be either vector (left, below) or raster (right, below). It is also possible to transform data from vector to raster. Image borrowed from Kim With’s Essentials of Landscape Ecology (Figure 4.14). We will do this in an R exercise next week.
Data of different types can be related spatially through shared locations (plots collected within a national park where total ecosystem carbon has been estimated). Data of different kinds can also be combined based on relationships between attributes. Consider relating a database on the traits of different species (e.g., maximum height, lifespan, breeding habitat) with data on samples collected in plots of those species. You can relate the species observed in plots with the traits of those species. We will not spend much time on relational databases, but I recommend learning some data wrangling skills (e.g., left_join in R’s dplyr package)
When you know how to work with different kinds of data of different types, you can find creative ways of combining disparate datasets to ask really interesting research questions.
ArcGIS Pro exercises
We’re going to do a few simple queries and summaries using data clipped to the Greater Yellowstone Ecosystem. Open ArcGIS Pro, navigate to the Lab 2 folder, and bring in data.
Open ArcGIS Pro, sign in with your NetID, start a new project with a Map. Give the new project a reasonable name (e.g., Lab 2 - Sept 2, 2025).
When your map opens, you can either click on “Add Data” to add all of the layers from the lab folder OR you can find the “Catalog” window in the far right, select Computer, navigate to the folder, and drag and drop all layers into the map.
Bring in all of the data, look at each layer, and consider what kind of data these layers represent. Click off all layers in the Contents pane and examine each dataset individually. Zoom in and out and click around and customize the symbology in different ways. You could spend all day doing this, but spend 5 or 10 minutes getting to know the data and changing the symbology of some of the layers.
Right-click on the layer name and select Attribute Table. This will open the attribute table for the dataset, if it has one. The attribute table considered by itself is what kind of data?
| Filename | Description | Source | Type (raster or vector?) |
|---|---|---|---|
| elevation_GYE.tif | Digital elevation model | LANDFIRE | __________________ |
| FIAdataV2_basal_area_ft2_per_acre.shp | Forest Inventory and Analysis plots | Forest Service FIA | __________________ |
| GAP_vertebrate_richness_GYE.tif | Richness of terrestrial vertebrates | USGS GAP | __________________ |
| GYE_boundary.shp | Greater Yellowstone Ecosystem boundary | Greater Yellowstone Coordinating Cmte | __________________ |
| NLCD_2024_GYE.tif | National Land Cover Database | USGS NLCD | __________________ |
| NorthAmericanRivers_GYE.shp | Major rivers | Commission forEnvironmental Cooperation | __________________ |
| PADUS4_1VectorAnalysis_GYE.shp | Protected Areas Database | USGS GAP | __________________ |
| us_eco_l4_GYE.shp | Ecoregions (level 4) | EPA | __________________ |
| usfs_carbon_total_initial_tons_per_acre_GYE.tif | Total forest carbon | USFS Firelab | __________________ |
Exercise 1: What is the landscape composition of the GYE?
We want to know how much of the Greater Yellowstone Ecosystem is forests, grasslands, shrublands, and considered developed (i.e., the composition of the landscape). This is a fundamental landscape measure discussed by Noss 1990 and shown in the upper-right of his diagram.
Click off all layers except for the land cover data (NLCD_2024_GYE.tif) and make sure you can see the different cover types in the Contents. One very basic question of landscape ecology is: “what is the composition of the landscape?” In other words, how much is there of what kind of land cover type? We can assess this on an absolute (total hectare or acres) or relative basis (% of the landscape). Before we use the GIS to calculate the relative amount of each land cover type, look at the map and see if you can guess how much of the Greater Yellowstone Ecosystem is forest, grassland, developed, etc.
Now, let’s calculate the percent area occupied by each cover type (i.e., the composition of the landscape).
Right-click on NLCD_2024_GYE.tif and select Open Attribute Table. This include Value which is actually a nominal variable type and represents the land cover code for each named land cover class (NLCD_Land, the right-hand column). The table also includes “Count”, which is the number of pixels within that land cover class.
Exercise 2: Where are the whitebark pines in the GYE?
The Forest Inventory and Analysis program is a national monitoring effort that “collects, processes, analyzes, and reports on data necessary for assessing the extent and condition of forest resources in the United States.” I have downloaded and pre-processed FIA plots to create a plot x species matrix, a common way to organize data on ecological communities. The data are in filename: FIAdataV2_basal_area_ft2_per_acre.shp. This shapefile includes the locations of FIA plots and an attribute table where rows are FIA plots and columns are species with the cells representing total basal area for that species. See this cartoon explaining basal area (the cross-sectional area of tree trunks). Basal area is a common way to measure tree abundance in forests (density or trees per area is the other).
One important note on FIA plot locations: they are intentionally offset to be inaccurate (i.e., “fuzzed” and in some cases “swapped”) to protect privacy of land owners and forest resources. While this complicates some analyses, studies have shown that many patterns are robust to this fuzzing, but see this paper too.
We will use FIA data to investigate broad patterns of whitebark pine (Pinus albicaulis) in the Greater Yellowstone Ecosystem. Let’s first quickly select which FIA plots have whitebark pine in them. We’ll use “Select By Attributes” to identify every plot where basal area of whitebark pine is greater than 0. Make sure the attribute table of the FIA data is open and click on “Select By Attributes” circled in red below.
In the Expression section choose “Pinus_albi” (Pinus albicaulis, whitebark pine) is greater than 0 and click OK. This will select the FIA plots that have some whitebark pine and turn the plot symbols in the map cyan (the color that usually indicates a feature is selected).
Notice the button and information I have circled in red below. The button filters the attribute table to only those rows that were selected. You can also see how many of the rows were selected from the total. This can be a quick and easy way of calculating a high-level result (i.e., how many or what proportion of your data include a select species, like whitebark pine?).
You can also right-click on any species and sort descending to see which plot has the highest whitebark pine basal area.
Exercise 3: What’s the relationship between tree and vertebrate diversity, ecosystem carbon, and elevation in the Greater Yellowstone Ecosystem?
We have pulled in other datasets that can be combined to ask pretty cool research questions like “what is the relationship between elevation, total ecosystem carbon, total tree basal area, tree richness, and vertebrate richness?” There are lots of conceptual models and hypotheses to describe the causes and consequences of relationships among some of these variables. For instance, do sites with high tree diversity also support higher diversity of terrestrial vertebrates? Do sites with more species store more carbon? Do sites with more carbon support more terrestrial vertebrates? How do these patterns change with elevation? The ecological questions could go on and on with just the few datasets available to use here.
We already have FIA plots overlaid with data on elevation, total forest carbon, vertebrate species richness, etc. I have already calculated tree richness in the FIA data by counting the non-zero columns (i.e., how many species were found be present in each plot?). Go up to Analysis and click Tools, the icon looks like a toolbox. This will open a Geoprocessing panel on the right.
In the search bar, which says Find Tools type in “Extract Multi Values to Points”.
Make sure the Input point features shows the FIA data and select the carbon, vertebrate richness, and elevation raster data. This will essential extract the values from each raster and append it to the FIA attribute table allowing us to do more statistical analysis with these data. I’ve tweaked the names in the “Output field name” a bit to clean up the new variable names. Click Run.
If that runs successfully, check the attribute table in the FIA data to see if the new data now appear. The new variables on carbon, vertebrate richness, and elevation will be on the far right side of your attribute table. You’ll need to scroll over to see them.
Now, let’s investigate the relationships we asked above using simple scatter plots, which can do right in ArcGIS Pro. I do not recommend conducting “real” analyses in ArcGIS Pro, but sometimes a fast data exploration is valuable. To create the scatter plots, right-click on the FIA data, select Create Chart, and Scatter Plot.
Next week we will be doing more of this kind of analysis using “raster stacks” in R using the terra, dplyr, and ggplot2 packages.